Stability of Feature Ranking Algorithms on Binary Data

نویسندگان

  • Aqsa Shabbir
  • Kashif Javed
  • Yasmin Ansari
  • Haroon A Babri
چکیده

Stability or robustness is a crucial yardstick for analyzing and evaluating feature selection algorithms which have become indispensible due to unprecedented advancements in knowledge data discovery and management. Stability of feature selection algorithms is taken as the insensitivity of the algorithm to perturbations in the training data with reference to the performance of the algorithm with all training data. In this work, we propose an algorithm for evaluating and quantifying the robustness of feature ranking algorithms and test three feature ranking algorithms: relief, diff-criterian and mutual information on four different real life binary data sets from text mining, handwriting recognition, medical diagnoses and medicinal sciences. We then analyze the stability profiles of feature selectors and determine how stability is a desirable characteristic of a feature ranking algorithm. We find that diff-criterian, and mutual information, outperform relief in stability.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Diagnosis of Heart Disease Based on Meta Heuristic Algorithms and Clustering Methods

Data analysis in cardiovascular diseases is difficult due to large massive of information. All of features are not impressive in the final results. So it is very important to identify more effective features. In this study, the method of feature selection with binary cuckoo optimization algorithm is implemented to reduce property. According to the results, the most appropriate classification fo...

متن کامل

Second-Order Statistical Texture Representation of Asphalt Pavement Distress Images Based on Local Binary Pattern in Spatial and Wavelet Domain

Assessment of pavement distresses is one of the important parts of pavement management systems to adopt the most effective road maintenance strategy. In the last decade, extensive studies have been done to develop automated systems for pavement distress processing based on machine vision techniques. One of the most important structural components of computer vision is the feature extraction met...

متن کامل

The Use of the Binary Bat Algorithm in Improving the Accuracy of Breast Cancer Diagnosis

Introduction: The early diagnosis of breast cancer as prevalent cancer among women, is a necessity in the research on cancers since it could simplify the clinical management of other patients. The importance of the classification of breast cancer patients into high- or low-risk groups has led research groups in the biomedical and informatics departments to evaluate and use computer techniques s...

متن کامل

The Use of the Binary Bat Algorithm in Improving the Accuracy of Breast Cancer Diagnosis

Introduction: The early diagnosis of breast cancer as prevalent cancer among women, is a necessity in the research on cancers since it could simplify the clinical management of other patients. The importance of the classification of breast cancer patients into high- or low-risk groups has led research groups in the biomedical and informatics departments to evaluate and use computer techniques s...

متن کامل

IFSB-ReliefF: A New Instance and Feature Selection Algorithm Based on ReliefF

Increasing the use of Internet and some phenomena such as sensor networks has led to an unnecessary increasing the volume of information. Though it has many benefits, it causes problems such as storage space requirements and better processors, as well as data refinement to remove unnecessary data. Data reduction methods provide ways to select useful data from a large amount of duplicate, incomp...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014